Ansible Server Configuration

Analyzing and Fixing Failed to Template String Errors in Ansible Playbooks

I’ve lost count of the 3 a.m. pages where a playbook just stops dead with “Failed to template string.” The first time it happened, I stared at a wall of Jinja2 tracebacks convinced I’d somehow broken Python itself. Since then, I’ve learned to stop guessing and start peeling back the layers, fast. This post walks through exactly how I troubleshoot ansible jinja2 template string errors now—no fluff, just what works.

Quick Summary

  • How to read the stack trace so you can target the correct task.
  • Three primary causes for the error: declaring a non-existent variable, not closing the right brackets, and escaping quotes.
  • How to isolate tasks for debugging, enable verbose output, and use the debug module to capture errors early on.
  • How to debug special cases when using a filter plugin or a lookup plugin.
  • How to set up syntax checks and test runs in order to prevent the same errors from being deployed to production again.

Understanding “Failed to Template String” Messages

Before trying to figure out what logic is wrong, determine the version of your Ansible execution environment. The messages returned by your Ansible control node or via Jinja2 template will often result in misleading messages due to incompatible versions of Jinja2 or of Python.

Defining the Execution Environment Prerequisites

Whenever I create a new Ansible environment, I start by confirming my version numbers. By performing this quick step, I can avoid wasting time finding ghosts of errors. If you do not know the version numbers, here is a code example. If the version of Jinja2 is missing or is read as being 2.x or lower, then you are going to have many more problems with such templating quirks than if you are using a more recent version. Also, if you see libyaml = True, this means your YAML parser is not falling back to the pure-Python parser for speed reasons, resulting in white space or indentation errors.

Parsing the Ansible Stack Trace

A template-related error does not mean your whole playbook has failed. The traceback shows you exactly which task and which variable resulted in the error.

TASK [app_deploy : Render nginx config from template] ********************************
fatal: [web01]: FAILED! => {"msg": "AnsibleUndefinedVariable: 'port_number' is undefined. 
  The error occurred while evaluating the template string: 'listen {{ port_number }};'. 
  The error was: 'port_number' is undefined"}

Look! It has the task name Render nginx config from template and the failed value listen {{ port_number }};. I don’t dig deeper into the stack trace at this point. I immediately go to the task script and check what variable it is referencing. The other lines printed by the stack trace simply restate the information from the template engine.

Core Triggers: Undefined Variable and Bracket Mismatch

In all of the template errors I have resolved from production, they almost always boil down to three issues: an undefined variable, a missing curly brace, or a conflict between YAML quoting and Jinja2 quoting.

Diagnosing the Undefined Variable Exception

You defined a variable inside your group_vars, but Ansible is still returning an “undefined variable” message. The problem is most commonly due to the use of a variable key with incorrect spelling or from a precedence error.

$ ansible-playbook -i inventory/prod site.yml -vvv
...
TASK [deploy_app : Restart application service] ***********************************
fatal: [app01]: FAILED! => {"msg": "The task includes an option with an undefined variable. 
  The error was: 'app_port' is undefined. 
  'app_port' is undefined\n\nThe error appears to be in '/home/jdoe/playbooks/roles/deploy_app/tasks/main.yml': 
  line 12, column 3, but may be elsewhere in the file depending on the exact syntax problem...."}

To investigate, I will use the option -vvv so it shows the line number and part of the raw template that’s causing the issue. I will also add a debug task just before the task that fails.

- name: show hostvars for this host
  debug:
    var: hostvars[inventory_hostname]['app_port']

The variable may be defined in host_vars/app01.yml, but a blank value from group_vars/all.yml is defined later and will override it. The rules regarding priority for variable assignment in Ansible are straightforward and can easily result in frustration. When in doubt, use debug: var=vars to display what variables you currently have available.

Mistake I made: Before understanding variable definitions and precedence rules, I used to define variables with |default('') throughout multiple playbooks just to allow my playbook to continue to execute. This creates hidden misconfigurations and subsequently results in invalid template files executing without error messages.I have a default response that I use only once I verify the data flow is intentional.

Fixing Bracket Mismatch Errors

If a single closing brace is missing, you create a string literal that cannot be parsed by Jinja2.

# BROKEN – missing closing braces
- name: echo the list
  debug:
    msg: "{{ item.name "    <-- Fails with "unexpected end of template"

# CORRECT
- name: echo the list
  debug:
    msg: "{{ item.name }}"   <-- Properly closed

When I have a very quick turnaround with conditional statements such as {% if...%}, I will always verify that for every {%, there is a matching %} and for every {{, there is a matching }}. Most of the time, the Jinja2 error message will indicate the line number where the mismatch occurs with the phrases unexpected token or unexpected end of template.

Resolving Quote Escaping Issues in YAML

If you mix double-quoted YAML with double-quoted Jinja2 expressions, the two quote styles can cause a collision in the parser. The easiest solution to this problem is to use different quotes so you do not have two styles that compete.

# FAIL – single quotes around a value that contains a colon
- name: set fact with colon
  set_fact:
    acme: "{{ 'challenge:token' }}"   <-- YAML reads colon as mapping separator

# PASS – use YAML block scalar or alternate quotes
- name: set fact safely
  set_fact:
    acme: '{{ "challenge:token" }}'   <-- YAML single quotes protect the colon

If you must stay inside double quotes, you can escape the Jinja2 output by using the | quote filter. Even better, store the raw strings in a variable and allow the variable to expand as expected without trying to fight the double quotes inline.

How to Troubleshoot Ansible Jinja2 Template String Errors

When a playbook has one hundred and fifty tasks, it can be very time-consuming to identify which task caused the template engine to fail. I typically approach this problem with a tight three-step loop.

Isolating the Failing Playbook Task

Instead of re-running the entire playbook, I will use the --start-at-task option to jump directly to the task that failed and, when necessary, use the --step option to evaluate every action that was taken up until this point.

$ ansible-playbook -i inventory/prod site.yml --start-at-task="deploy_app : Restart application service" -vvv

Using this method prevents me from wasting time running the entire playbook when the only goal is to find the task that failed.To Debug My Playbooks and Templates, I Do the Following

  1. The task name must be identical to that taken from the stack trace. When in the stack trace, add a local variable to identify where the failure occurred and run the playbook for a single host limited by --limit option.
  2. Lookup Result Verification – Always Check What is Returned Use the Debug Module

Don’t assume that what a registered result or the value returned from a lookup will be as you expect it. Before running the template task, put in an immediate debug task and check the value live.

  1. Verifying registered results or lookup values with the Debug Module

What I usually do is anytime I have a registered result or lookup, I don’t assume the value returned is what I think it is and will check what it contains, as described above.

  1. Edge Cases with Filter & Lookup Plugins – These are Very Strange Results

Standard templating mistakes are easy to spot. However, when complex data transformations are involved or secrets are pulled from external services, you encounter some strange results.

Handling Dictionary Filter Plugin Crashes

While the json_query filter is extremely powerful and allows for a significant amount of flexibility within a playbook, it is also one of the most fragile filters. If a JMESPath expression is mistyped, the result will not only be nothing; it will also throw a display error and immediately terminate the entire task, resulting in an Ansible run failure

# FAILS – wrong JMESPath syntax
- name: extract internal IPs
  debug:
    msg: "{{ cluster_nodes | json_query('[].private_ip') }}"   <-- crashed with "Expected comma separator"

# WORKS
- name: extract internal IPs correctly
  debug:
    msg: "{{ cluster_nodes | json_query('[*].private_ip') }}"  <-- JMESPath wildcard requires asterisk

For longer expressions, I will test the expression first using either jp or a quick Python script before pasting it into the playbook to ensure its validity. The Jinja2 template designer documentation does not have any information regarding JMESPath, so the validation must occur within isolation.

Mitigating Unreachable Lookup Plugins

Templating lookups like lookup('hashi_vault', ...) or lookup('aws_secret', ...) will fail silently if the backend service is unreachable during the execution of the play. Ansible will assume that the lookup result will be undefined, collapsing the content of the entire template string

- name: fetch secret from vault (may fail)
  set_fact:
    db_password: "{{ lookup('hashi_vault', 'secret=secret/db cred=token') | default('!vault_down!', true) }}"

To mitigate the impact of unreachable backends, I will pipe every remote lookup through a default value, | default('fallback', true) where the second argument true tells the default filter to invoke during the lookup even in the event there is a failure. In this way, if there is a temporary network blip, I will not have a complete loss of control over the entire run. I will also surround the entire task with a block that includes an ignore_errors: yes and then log the failure in the rescue section.

Automating Syntax Validation and Prevention

You do not need to wait until the playbook is running to identify templating errors. I integrate static syntax validation directly into my Continuous Integration (CI) pipeline.

Enforcing Strict Syntax Validation

Using ansible-lint, a collection of rules will check for Jinja2 syntax errors prior to changing any servers.

$ ansible-lint playbooks/site.yml --profile=min -x yaml
WARNING  Overriding detected file kind 'yaml' with 'playbook' for given position
playbooks/site.yml:12: [jinja[spacing]] Too many spaces before colon in Jinja2 expression
playbooks/site.yml:24: [template-instead-of-shell] Use the template module instead of inline jinja in shell

I use this as a pre-commit hook.The jinja[spacing] rule helps identify potential bracket-spacing bugs so that you can avoid having to deal with a cryptic error message (“failed to template”). You can get more detailed information regarding all available rules in the ansible-lint documentation page.

Dry‑Run Testing with Check Mode

When you run a check, changes aren’t made but all template expressions will still be evaluated, making this an ideal final step.

$ ansible-playbook -i inventory/prod site.yml --check --diff -vv

With –diff, you’ll be able to see each line of the rendered file in your terminal. I typically look for any half-rendered placeholders (e.g. listen {{ }};) as this indicates that a variable was never populated. If the dry run processed successfully and produced no templating errors, your actual run should also process successfully.

Frequently Asked Questions

Why does Ansible say “undefined variable” when I defined it in host_vars?

Variable precedence and filename quirks are often contributors. A host assigned to more than one group with a conflicting group_vars file could control which value is utilized. Additionally, host_vars files should use the exact same name as the inventory hostname (instead of an alias). To verify, I’ve added debug: var=vars prior to the failing task and can see what value has been assigned in the output.

How do I correctly escape special characters in complex Jinja2 templates?

If you’re creating a string that may include Jinja2 delimiters (like literal {{), put it inside {% raw %}…{% endraw %}. Also, when supplying a filter with a string that contains quotes, you can use the | quote filter; alternatively, you could escape the internal quotes with backslashes. When I create a lot of string escaping, I typically make a vaulted variable, allowing me to keep the template clean but still conforming to YAML standards.

Can I force Ansible to ignore an undefined variable and continue execution?

Yes, but avoid using this method unless it is intentional. You should use the default filter to supply a known fallback: {{ undefined_var | default(‘safe_placeholder’) }}. In the event that the variable needs to be optional across an entire role, assign its value to the defaults/main.yml file for that role. Alternatively, if you wish to set the global treatment of undefined variables to false by assigning DEFAULT_UNDEFINED_VAR_BEHAVIOR in your ansible.cfg file, be very wary of using this option. The last option will globally prevent failures and is not recommended as it may mask configuration issues. Using the |default filter with an appropriately meaningful fallback, as described in the Ansible filters document, is the best way to allow your playbook to continue execution while not masking configuration issues.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button